Flink 1.14: Fix the flaky testHashDistributeMode by ingesting all rows in one checkpoint cycle. by openinx · Pull Request #4189 · apache/iceberg

openinx · 2022-02-22T07:19:55Z

This PR is trying to fix the flaky testHashDistributeMode unit test fundamentally. The following are the explannation about the current fix.

…in one checkpoint precisely.

openinx · 2022-02-22T07:43:14Z

Run this 20 times in my host, everything seems OK:

for i in `seq 1 20`; do
    ./gradlew :iceberg-flink:iceberg-flink-1.14_2.12:test --tests "org.apache.iceberg.flink.TestFlinkTableSink"
    if [ ! $? -eq 0 ] ; then
        exit 1
    fi
done

yittg · 2022-02-22T09:37:29Z

flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkTableSink.java

+    List<Row> dataSet = ImmutableList.of(
+        Row.of(1, "aaa"), Row.of(1, "bbb"), Row.of(1, "ccc"),
+        Row.of(2, "aaa"), Row.of(2, "bbb"), Row.of(2, "ccc"),
+        Row.of(3, "aaa"), Row.of(3, "bbb"), Row.of(3, "ccc"));
+    String dataId = BoundedTableFactory.registerDataSet(ImmutableList.of(dataSet));


shall we produce more than one checkpoint? and add enough records in each part instead of enumerating them?

I think a single checkpoint is good enough to validate the PartitionKeySelector. More checkpoints will make the unit test more complex but validate the same thing in my mind.

Mocking more records as the testing data set looks good to me.

yittg · 2022-02-22T09:38:28Z

flink/v1.14/flink/src/test/java/org/apache/iceberg/flink/TestFlinkTableSink.java

+      sql("INSERT INTO %s SELECT * FROM %s", tableName, SOURCE_TABLE);

      Table table = validationCatalog.loadTable(TableIdentifier.of(icebergNamespace, tableName));
      SimpleDataUtil.assertTableRecords(table, ImmutableList.of(


check records based on dataSet?

yittg

LGTM

stevenzwu · 2022-02-23T04:01:24Z

@openinx I have run the test hundreds of times locally in a test loop like you did before and was never able to reproduce it.

Think again about the root cause that we discussed in the issue where we may miss the notifyCheckpointComplete callback. As a result, two checkpoint cycles got squashed into one Iceberg commit and hence have 2 files for a partition in one Iceberg commit.

I misunderstood the PR earlier. Looks like the change is to make sure we have one checkpoint cycle for all rows to bypass the potential problem from multiple checkpoint cycles.

stevenzwu

LGTM.

nit: can we change the description to "by ingesting all rows in one checkpoint cycle"? Earlier, I misunderstood the PR. I mistakenly thought we are still doing multiple checkpoint cycles and we are just precisely control rows in each checkpoint cycle.

openinx · 2022-02-23T06:33:10Z

@stevenzwu The root cause is : Previous design could not guarantee that a single checkpoint could commit all rows to a given transaction. Here is another example. That's why we are now trying to guarantee this in this PR.

The new description looks good to me if you think it's more clear.

stevenzwu · 2022-02-23T16:43:29Z

@openinx looks good. can you merge this? should be safe.

rdblue · 2022-02-23T20:33:33Z

Thanks for fixing the flaky test, @openinx!

… one checkpoint cycle (apache#4189)

Flink 1.14: Fix the flaky testHashDistributeMode by controlling rows …

fd219d4

…in one checkpoint precisely.

openinx added this to the Iceberg 0.14.0 Release milestone Feb 22, 2022

github-actions bot added the flink label Feb 22, 2022

openinx mentioned this pull request Feb 22, 2022

Flakey flink unit tests TestFlinkTableSink#testHashDistributeMode #2575

Closed

yittg reviewed Feb 22, 2022

View reviewed changes

Generate more rows and simplify the assertion.

35137dc

yittg approved these changes Feb 22, 2022

View reviewed changes

openinx mentioned this pull request Feb 23, 2022

Flink: Remove the scala version suffix from artifact names. #4193

Merged

stevenzwu approved these changes Feb 23, 2022

View reviewed changes

openinx changed the title ~~Flink 1.14: Fix the flaky testHashDistributeMode by controlling rows in one checkpoint precisely~~ Flink 1.14: Fix the flaky testHashDistributeMode by ingesting all rows in one checkpoint cycle. Feb 23, 2022

rdblue approved these changes Feb 23, 2022

View reviewed changes

rdblue merged commit e6c08a8 into apache:master Feb 23, 2022

openinx mentioned this pull request Feb 24, 2022

Flink 1.13: Fix flaky testHashDistributeMode by ingesting all rows in one checkpoint cycle #4213

Merged

openinx added a commit to openinx/iceberg that referenced this pull request Feb 24, 2022

Flink 1.12: Fix flaky testHashDistributeMode by ingesting all rows in…

ffa1102

… one checkpoint cycle (apache#4189)

openinx mentioned this pull request Feb 24, 2022

Flink 1.12: Fix flaky testHashDistributeMode by ingesting all rows in one checkpoint cycle. #4214

Merged

nastra pushed a commit to nastra/iceberg that referenced this pull request May 16, 2022

Flink 1.14: Fix flaky testHashDistributeMode by ingesting all rows in…

af5e80a

… one checkpoint cycle (apache#4189)

nastra mentioned this pull request May 16, 2022

[0.13] Flink upsert delete file metadata backports #4786

Merged

nastra pushed a commit to nastra/iceberg that referenced this pull request May 17, 2022

Flink 1.14: Fix flaky testHashDistributeMode by ingesting all rows in…

7789506

… one checkpoint cycle (apache#4189)

nastra pushed a commit to nastra/iceberg that referenced this pull request May 17, 2022

Flink 1.14: Fix flaky testHashDistributeMode by ingesting all rows in…

bb8577a

… one checkpoint cycle (apache#4189)

nastra pushed a commit to nastra/iceberg that referenced this pull request May 17, 2022

Flink 1.14: Fix flaky testHashDistributeMode by ingesting all rows in…

07e0354

… one checkpoint cycle (apache#4189)

nastra pushed a commit to nastra/iceberg that referenced this pull request May 17, 2022

Flink 1.14: Fix flaky testHashDistributeMode by ingesting all rows in…

3ce6ac8

… one checkpoint cycle (apache#4189)

nastra pushed a commit to nastra/iceberg that referenced this pull request May 18, 2022

Flink 1.14: Fix flaky testHashDistributeMode by ingesting all rows in…

9247d63

… one checkpoint cycle (apache#4189)

nastra pushed a commit to nastra/iceberg that referenced this pull request May 18, 2022

Flink 1.14: Fix flaky testHashDistributeMode by ingesting all rows in…

62a85a9

… one checkpoint cycle (apache#4189)

nastra pushed a commit to nastra/iceberg that referenced this pull request May 18, 2022

Flink 1.14: Fix flaky testHashDistributeMode by ingesting all rows in…

5ea6b7f

… one checkpoint cycle (apache#4189)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flink 1.14: Fix the flaky testHashDistributeMode by ingesting all rows in one checkpoint cycle.#4189

Flink 1.14: Fix the flaky testHashDistributeMode by ingesting all rows in one checkpoint cycle.#4189
rdblue merged 2 commits intoapache:masterfrom
openinx:flakey-test-hash-distribute-mode

openinx commented Feb 22, 2022

Uh oh!

openinx commented Feb 22, 2022

Uh oh!

yittg Feb 22, 2022

Uh oh!

openinx Feb 22, 2022

Uh oh!

yittg Feb 22, 2022

Uh oh!

yittg left a comment

Uh oh!

stevenzwu commented Feb 23, 2022 •

edited

Loading

Uh oh!

stevenzwu left a comment

Uh oh!

openinx commented Feb 23, 2022

Uh oh!

stevenzwu commented Feb 23, 2022

Uh oh!

rdblue commented Feb 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

openinx commented Feb 22, 2022

Uh oh!

openinx commented Feb 22, 2022

Uh oh!

yittg Feb 22, 2022

Choose a reason for hiding this comment

Uh oh!

openinx Feb 22, 2022

Choose a reason for hiding this comment

Uh oh!

yittg Feb 22, 2022

Choose a reason for hiding this comment

Uh oh!

yittg left a comment

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Feb 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevenzwu left a comment

Choose a reason for hiding this comment

Uh oh!

openinx commented Feb 23, 2022

Uh oh!

stevenzwu commented Feb 23, 2022

Uh oh!

rdblue commented Feb 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stevenzwu commented Feb 23, 2022 •

edited

Loading